Menu

Structure of a JPEG File

mithryn

The Structure of a JPEG File

JPEG files are predictable in their structure, with each segment of
information (whether metadata in a header or the image itself)
delimited by well-known hex values called “markers.” The general
structure is as follows:
JPEG Image


Bit Value Description
0xFF 0xD8 Start of Image, the first bytes of a JPEG file.
0xFF [Segment ID] a Marker indicating a new segment. Each type of segment has a unique ID.
0XFF 0xD9 End of Image, the last bytes in a file.

Here are some additional JPEG Segment markers

Bit Value Name Description
0xFF 0xE0 APP0 Application Marker (in every JPEG)
0xFF 0xDB DQT Quantization Table
0xFF 0xC0 SOF0 Start of Frame
0xFF 0xC4 DHT Define Huffman Table
0xFF 0xDA SOS Start of Scan
0XFF 0xED APP14 Photoshop storage * The one we need *

The information we’re interested in is stored in a segment known as APP14, 0xFF 0xED. The App14 Segment contains the following structure:
App14 Segment


Bit Value Description
0xFF 0xED start of APP14 Segment
2 bytes the segment size, excluding the marker, but including these two bytes.
Photoshop 3.0\x00 A fixed string

8BIM Segments individual fields in the APP14 segment. An 8BIM segment in turn has the following structure:

Bit Value Description
8BIM a four byte segment marker (this is, in fact, the string)
Segment Type two bytes indicating the segment type
Zero padding 4 bytes of 0
Segment size two bytes, excluding the marker, type, padding, and segment size
Segment data the actual data of the 8BIM segment

Inside the 8BIM segment’s data are additional subsegments, indicated
as such:

Bit Value Description
0x1C 0x02 Subsegment marker
Segment type 1 byte indicating the type of marker
Segment size 2 bytes excluding the marker, type, and size
Segment data the data

The IPTC keyword itself is then stored in one of these sub-segments; specifically, type 0x19. There may be multiple of these keyword subsegments as the standard allows for more than one per image.

A program that manipulates these keywords, then, must do the following:

  1. Parse the header to find if this APP14 segment exists, and if so, if it contains the photoshop, 8BIM, and 0x19 subsegment.
  2. If it does contain 0x19 segments, it must read them and present them to the user, so he knows what keywords have already been assigned.
  3. If the user deletes, changes, or adds keywords, upon a file save it must re-write the entire image file to a new file, with recalculated segment lengths for each sub-segment and parent segment.
    • Note that each subsegment has a length, as does each parent
      segment, so each segment must be recalculated.

Related

Wiki: Home